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Bayesian  Data  Analysis  of  Gambling  Preferences 


1  ni  reduction 


Bayesian  data  analysis  has  been  feasible  since  lyi  f  when  Hev.  Thomas  Bayes 
formulated  his  theorem  (which  is  just  a  straight forward  application  of  the  def¬ 
inition  of  conditional  probability) : 


P(H|D)  = 


P(D|H)  P(H)  /  £  P(D|H  )  P(H  ) 

i  i 

vi _ _ „ _ j 


P(D)  (overall  prob.) 


Despite  its  availability  for  such  a  long  time,  research  workers  have  made 
little  use  of  it.  Even  most  researchers  who  consider  themselves  Bayesians 

have  used  it  only  as  a  normative  model  for  human  information  processing 
but  not  for  processing  data,  although  Edwards,  Lindman  &  Savage  (1963)  have 
pointed  out  its  advantages  for  statistical  inference  almost  10  years  ago,  and 
although  easily  readable  textbooks  are  available  now  (e.g.,  Hays  &  Winkler 
1970  have  a  long  chapter  on  Bayesian  inference,  and  the  books  by  McGee  (1971) 
and  Winkler  (1972)  are  especially  devoted  to  these  procedures). 

Bayesian  statistics  differs  from  traditional  statistics  in  using  infor¬ 
mation  not  contained  in  the  sample,  namely,  P(H),  the  prior  probability  of  the 
hypothesis.  In  testing  hypotheses,  traditional  statisticians  use  only  P(d|h), 


rejecting  a  hypothesis  h  when  P( D | )  plus  the  probability  of  more  extreme 


data  is  below  a  certain  prefixed  level  a. 

Traditional  statisticians  have  occasionally  objected  to  the  idea  of 
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taking  into  account  any  prior  information,  like  P(H  ),  which  wrs  not  obtained 
from  an  observed  sample.  Those  who  use  Bayesian  methods  but  insist  upon  priors 
.nferred  from  previous  observations  rather  than  intuition  call  themselves  Em¬ 
pirical  Bayesians  (e.g.,  Martiz,  1970) « 

In  a  sense,  Bayesian  statistics  can  be  viewed  as  an  extension  of  tradi¬ 
tional  statistics;  it  uses  the  same  information  plus  something  more,  namely 
prior  probabilities,  under  assumption  that  all  information  available  should  be 
used  for  decisions  among  competing  hypotheses.  Actually,  according  to  the 
principle  of  stable  estimation,  even  strongly  biassed  priors  cannot  do  much 
harm  to  the  posteriors  as  long  as  the  data  used  for  their  revision  do  have 
enough  diagnostic  impact,  and  as  long  as  the  prior  distribution  is  not  too 
small  in  the  region  favored  by  the  data,  and/or  not  too  peaked  elsewhere. 

(For  more  details  about  the  principle  of  stable  estimation,  see  Edwards, 
Lindman  &  Savage,  1963.)  Thus,  the  arbitrary  and  intuitive  nature  of  prior 
distributions  does  not  constitute  a  reason  for  not  using  Bayesian  statistical 
methods . 

It  is  probably  easy  to  show  that  every  scientist  observing  and  analyzing 
data  has  some  priors  with  respect  to  his  hypotheses — however,  to  discuss  this 
is  not  the  point  of  this  paper,  and  the  reader  interested  in  these  problems  is 
referred,  e.g.,  to  Kuhn  (1962).  Convenient  techniques  to  elicit  and  assess 
the  scientist's  prior  probability  distributions  over  hypotheses  are  available; 
some  of  them  are  described,  e.g.,  in  Winkler  (1967)  and  Stael  von  Holstein 
(1970). 

In  this  paper,  we  pay  little  attention  to  prior  distributions  over 


hypotheses.  We  will  rather  concentrate  on  likelihoods  P(D|l^),  which  are  more 

public  and  less  controversial  than  prior  P(ll  ). 

i 

Usually,  a  hypothesis  to  be  tested  in  traditional  statistics  implies  that 
a  certain  parameter  value  obtains,  e.g.,  in  traditional  null  hypothesis  test¬ 
ing  the  hypothesis  is:  11^:0  =  0^  for  some  parameter  0,  which  is  tested  against 

the  rather  diffuse  alternative  that  0^0.  In  most  cases,  traditional  stat- 

o 

isticians  cannot  figure  a  probability  for  the  data  observed  given  this  diffuse 
alternative  hypothesis,  and  therefore  £,  the  probability  of  an  error  type  II, 
is  left  unknown. 

In  such  a  case,  the  Bayesian  usually  would  not  consider  a  point  hypothesis 

0=0  as  opposed  to  a  continuum  of  other  values  of  0,  but  rather  would  assess 
o 

a  continuous  prior  distribution  over  the  whole  parameter  space,  which  is  then 
treated  as  a  continuous  set  of  hypotheses.  The  evidence  from  the  sample  ob¬ 
served  would  then  be  used  to  revise  this  continuous  prior  distribution  over 
the  parameter  space  according  the  Bayes's  theorem,  which  reads  for  the  con¬ 
tinuous  case: 


_ g(*l9)  f(9) 

/g(x|0')  f(0')  d0 ' 


and  gives  a  continuous  posterior  distribution  over  the  same  parameter  space. 
Although  Bayesian  statistics  can  handle  any  number  of  competing  hypotheses 
simultaneously— up  to  an  infinite  number  which  is  the  continuous  case  discussed 
just  above— the  most  convenient  case  deals  with  only  two  competing  hypotheses — 
.such  as  the  traditional  test  of  H  against  Its  alternative,  the  catch-all  hypo¬ 
thesis.  The  advantage  of  testing  only  two  hypotheses  against  each  other  in 
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Bayesian  analysis  is  that  Bayes's  theorem  can  then  be  written  in  ratio  form 
so  that  P(D)  cancels  out: 


P(H  I  D)  P(H1) 


P(D|Hi) 


p(h2|d)  p(h2) 


p(d|h2) 


This  is  known  as  the  odds -likelihood-ratio  form  of  Bayes's  theorem: 


fl  =  Q,  LR(D);  in  words: 
D  o 


posterior  odds  =  prior  odds  x  likelihood  ratio, 


For  conditionally  independent  data,  the  likelihood  for  the  whole  set  of  data 


D  =  (d  .  d . d  )  is  the  product  of  the  likelihoods  of  the  individual  data 

'  l1  2’  ’  m' 


P(D|H.)  =  It  P(d  |  H±) , 


and  then  the  odds-likelihood-ratio  equation  becomes : 


n  =  n 

D  o 


n  lr( d . ) . 
d  0 


Bayesian  data  analysis  with  these  formulae  are  easy,  straightforward,  and  ef¬ 
ficient  if  you  have  perfect  knowledge  of  the  data  generating  process  which 
gives  you  P(d|h),  but  can  be  quite  a  problem  if  you  don’t. 


Bayesian  Analysis  of  Learning  Data 


Let's  look  at  an  easy  case  first:  excellent  examples  to  do  Bayesian  data 
analyses  are  comparisons  of  learning  models.  E.g.,  Restle  &  Greeno  (1970) 


J 


L) 


U1 


1 


0 1 


1 


i 


I  7 


...  .  ■ 


compare  a  linear  operator  model  (H  )  by  Mower  (1901)  (also,  see  Atkinson, 
Bower  &  Corothers,  1965,  p.  91). 


pnwy 


a  -  (a  -  b)  (1  -  Q  ) 


and  an  accumulative  model  (H  ) 

b  +  G  a(n  -  1) 

P  (c  |  H  )  =  - — - 

n  '2  1  +  02( n  -  1) 


where  P  ( c  |  H  )  is  the  probability  of  a  correct  response  on  trial  n  under  the 

respective  models,  0  is  a  parameter  of  the  learning  curve,  and  a  and  b  are 

initial  and  asymptotic  success  probabilities,  respectively.  Corresponding 

probabilities  of  wrong  responses  (errors)  are  P  (e|H  )  =  1  -  P  (c|H  ) 

n  i  n  1  i 

Bower  (1961)  had  29  Ss  learn  a  list  of  10  items,  "to  a  criterion  of  2  con¬ 
secutive  errorless  cycles.  A  response  was  obtained  from  the  S  on  each  pre¬ 
sentation  of  an  item  (p.  528) ,  Stimuli  were  pairs  of  consonant  letters,  re¬ 
sponses  were  the  integers  1  and  2,  each  of  the  assigned  to  5  of  the  stimuli. 

Twenty -nine  Ss  times  10  items  makes  290  on  each  trial  (unless  some  Ss  did 
not  get  to  the  last  trials  because  they  completed  their  two  errorless  cycles 
®ai  lier) .  Ihe  data  Bower  obtained,  in  terms  of  relative  frequencies  of  cor¬ 
rect  responses  on  the  n-th  trial,  are  reproduced  in  Table  1,  column  2,  from 
Restle  &  Greeno  (197c,  p.  8). 

To  evaluate  the  two  competing  learning  theories  and  H  given  the  evi¬ 
dence  from  these  data,  Restle  &  Greeno  (1970)  assumed  a  =  1,  and  b  =  .5,  esti¬ 
mated  0^  from  the  data,  and  calculated  P^0!^)  using  these  parameter  estimates. 

Resulting  P ( c | H ),  P  (c|h  ),  and  corresponding  P  (e I H  )  and  P  ( e I H  )  are 
Li  l  n  d.  n  1  n  2 
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0 

0 

0 
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n 


reproduced  in  columns  3-('  of  Table  1.  Restle  &  Greeno  then  compared  the  two 
models  by  calculating  the  sum 

2 

A,  =  £  (P  (c|h.)  -  P  (c  observed)) 

i  n  n'  1  i'  n 

for  both  models  (i  =1,  2).  A^  was  ,00k2,  A^  was  .011,  indicating  a  better  fit 
of  H^. 

A  Bayesian  data  analysis  would  consist  of  calculating  likelihood  ratios 
P  (c|Hj/P  (c|H,J  f°r  each  correct  response  observed,  and  P  (e|H  )  /  P  (e|H  ) 
for  each  error  response,  and  multiplying  them  all  together  to  get  the  overall 
likelihood  ratio. 

To  do  so,  we  need  absolute  frequencies  of  errors  and  correct  responses 

on  the  11  trials,  which  are  not  given  in  Restle  &  Greeno’ s  book,  nor  in  Bower's 

paper.  We  reconstructed  them  by  multiplying  the  relative  frequencies  given  in 

Restle  &  Greeno  (column  2  in  Table  1)  by  290  (29  Ss  times  10  items),  resulting 

in  the  absolute  frequencies  of  correct  responses  of  f  (c)  and  errors  (f  (e)) 

n  n 

reproduced  in  columns  11  and  12  of  Table  1.  (These  estimates  may  contain  some 
errors  if  some  Ss  quit  before  reaching  the  11th  trial  because  they  had  completed 
their  two  errorless  cycles  earlier.) 

For  convenience,  the  calculation  of  LR(d  )  and  LR(D)  is  performed  in  log- 

d 

arithms:  In  column  13,  we  have 

log  LR(Dn)  =  fn  (c)  [log  Pn  (c| H.)  -  log  Pn  (c|H2)] 

+  fn(e)  [log  P^  ( e |  )  -  log  M^)], 


7 


and 


log  LR(D), 


E  log  LR(D  )  = 

«  n 


with  the  respective  logarithms  in  columns  7  through  10,  and  observed  frequencies 


f  (c)  and  f  (e)  in  columns  11  and  12. 
n  n 


The  resulting  log  LR(D)  is  9.0253,  indicating  a  likelihood  ratio  LR(D) 


over  a  billion:  LR(D)  ~  1.06l  *  109.  I.e.,  if  we  had  assumed  equal  priors, 

P( H^)  =  P(H2)  =  .5,  this  would  mean  that  is  over  a  billion  times  more  likely 


that  H  . 
2 


Although  this  could  be  taken  as  strong  evidence  for  the  principle  of 
stable  estimation— even  very  heavily  biassed  priors  would  have  been  corrected 
by  such  a  large  likelihood  ratio,  we  have  to  consider  it  with  some  reservation.' 

As  we  pointed  out  already,  it  is  doubtful  if  we  can  actually  assume  290 
observations  in  the  last  trials  (7-11)  because  some  Ss  may  have  quit  earlier. 
Reduction  of  the  numbers  of  observations  in  the  last  trials  would  reduce  LR(D) 
considerably  because  trials  n  =  7  through  n  =  11  contribute  most  to  LR(D), 
except  for  n  =  2. 

Unfortunately,  the  original  complete  data  are  no  longer  available.  How¬ 
ever,  a  letter  from  Bower  assures  that  these  figures  actually  can  be  taken  as 
numbers  of  correct  responses  assuming  that  the  subjects  would  not  make  any 
more  errors  had  they  continued  after  their  last  two  errorless  cycles. 

Another  question  is  whether  we  really  can  assume  independence  of  obser¬ 
vations  enabling  us  to  multiply  likelihoods.  Although  the  observation  them¬ 
selves  are  clearly  obtained  independently,  the  independence  assumption  for  the 

conditional  probabilities  P  (d  |H  )  might  not  hold. 

n  j  i 


-  8  - 


A  way  out  oi  this  rni/rht  be  110b  to  i*b.  ulat.e  the  whole  learning  curve  lor 

each  model,  but  rather  just  to  predict  P  (d.|H  )  from  the  P  (observed  so 

rj+l  J  i  n 

far)  by 

PnH(c*V'l)  =  (i  -  0L)  Pn  +  0xa,  and 

R  i  a  0  ( R  +  W  ) 

I'  (oil'  ,11  )  =  - — - 5 - £ - I _ I _ 

n+1  n  p  (Rn  +  a  °?  +  wx))  1  (wn  +  (l-a)  02  (R^  +  W  )) 

In  Model  2,  this  requires  an  additional  assumption  about  R^  and  W  5  we 

used  Ri  =  WL  =  5  for  the  calculation  of  P^c  |  P^Hg) .  Actually,  the  choice 

of  W  =  R  does  not  make  much  of  a  difference. 

We  use  this  example  to  demonstrate  a  slightly  different  way  of  performing 

the  data  analysis:  In  Table  1  we  took  logarithms  of  P  (c|p  ,H  )  and 

n  1  n-1  r 

Pn^e^Pn-l,I1i'  tor  1  ~  2*  and  then  subtracted  the  logarithms  of  these  proba¬ 

bilities  for  i  =  2  from  those  for  i  =  1  (multiplied  by  the  respective  numbers 
of  observations);  in  Table  2  we  calculate  the  likelihood  ratios  for  correct 
responses  and  errors  directly  (by  dividing  the  hit  probabilities  in  column 
5,  and  by  dividing  the  error  probabilities  in  column  6  by  those  in  column  7 
to  yield  column  8),  and  then  take  the  logarithms  of  these  likelihood  ratios 
for  hits  and  errors  (columns  10  and  12)  to  multiply  them  to  the  respective 
numbers  of  observations  (columns  9  and  11),  and  sum  over  these  products. 

The  log  likelihood  ratio  is  now  "only"  2. 2508,  indicating  a  likelihood 
ratio  of  178.2  in  favor  of  Model  1.  Of  course,  taking  into  account  the  observed 


number  of  correct  responses  on  the  previous  trial  in  each  calculation  of 
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P  (c|  11  P  )  brings  these  probabilities  under  both  models  closer  to  (he  ao- 
n  1  i’  n-1 

tual  data,  and  thus  levels  out  differences  between  them.  The  resulting  like¬ 
lihood  ratio  is  still  large  enough  to  correct  even  strongly  biassed  prior  odds 
against  Model  1,  and  now  it  takes  conditioned  non-independence  into  account. 

The  analysis  could  be  further  improved  by  many  maximum  likelihood  extimates 
for  0^  rather  than  the  least  squares  estimates  we  took  from  Restle  &  Greeno 
(1970)  for  this  demonstration.  However,  since  the  evaluation  of  learning 
models  is  not  our  main  concern  in  this  paper,  we  will  now  turn  to  analyses 
of  choice-among-gambles  data. 

Bayesian  Analysis  of  Gambling  Preferences 

As  we  have  seen,  Bayesian  data  analyses  are  quite  straightforward  models 
that  provide  us  explicit  probabilities  of  occurrence  between  0  and  1  for  each 
event  wc  might  observe.  We  have  taken  learning  curves  as  an  example;  other 
feasible  examples  could  be  taken  from  psychophysics,  signal  detection  theory, 
Lucean  ft  Thurstonean  choice  theories,  etc. 

However,  in  analyzing  gambling  preference  data  we  encounter  different 
problems,  particularly  with  deterministic  choice  models.  Since  they  require 
deterministic  choices,  i.e„,  with  probabilities  0  and  1,  no  Bayesian  data 
analysis  is  feasible  under  these  assumptions.  This  may  be  one  of  the  reasons 
why  decision  analysts  and  other  scientists  strongly  advocating  Bayesian  pro¬ 
cedures  as  normative  models  for  human  information  processing  rather  seldom 
use  Bayesian  methods  in  their  data  analyses:  they  mostly  favor  deterministic 
choice  models  which  prevent  them  from  applying  their  own  principles. 


11 


We  are  going  to  illustrate  Bayesian  iata  analyses  of  choice-among-gambles 

data  on  two  sets  of  data  here,  both  borrowed  from  colleagues:  one  is  from  an 

experiment  by  fiomnitTH  {I'/TZ]  with  normal  and  eddcSblt  .retarded  children  of  b, 

10,  12,  and  lk  years  of  age  where  it  seems  rather  appropriate  to  replace  the 

deterministic  normative  model  by  a  probabilistic  one,  the  other  set  of  data 

is  from  an  experiment  by  Seghers,  Fryback  &  Goodma.,  (1973)  with  adult  subjects 

where  the  conventional  (Lucean)  probabilistic  choice  models  might  indicate 

too  weak  preferences  as  compared  to  the  choice  probabilities  inferred  from 

the  data. 

Hommers 1  Data 

Hommers  (1973)  i-n  his  dissertation  compares  choices  among  bets  made  by  8, 
10,  and  12  years  old  normal  children,  and  8,  10,  12  and  lk  years  old  educable 
retarded  children.  Each  set  of  gambles  presented  as  choice  alternatives  to 
the  S  consisted  of  3  bets  labelled  W,  L,  and  S,  respectively,  where  W  indicates 
the  choice  with  the  .Largest  amount  to  be  won  but  with  the  smallest  winning 
probability,  S  the  one  with  the  largest  winning  probability  but  the  smallest 
amount,  and  L  had  medium  probability  and  payoff.  Table  3  shows  winning  prob¬ 
abilities  (P),  payoffs  (V),  and  expected  values  (EV)  for  the  three  choice 
alternatives  W,  L,  and  S  of  each  of  Hommers'  15  stimuli.  Stimuli  were  presented 
to  Ss  in  form  of  index  cards  showing  sets  of  "winning"  and  "not  winning"  balls 
in  urns,  and  displaying  the  amounts  to  be  won  in  coins.  Subjects  made  their 
choice  by  indicating  their  favored  gamble,  which  was  played  thereafter.  About 
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half  of  the  Ss  in  each  r ge  and  schoul  level  haa  previous  experience  with 

choicer  on  stimulus  card:-  with  two  choice  alternatives,  so  that  there  are  three 

independent  variables:  school  level  (normal  vs.  edueable  retarded)  ,  a,rr  level, 
and  prior  gambling  experience  vs.  no  prior  gambling  experience. 

:lommers '  data,  i.e.,  frequencies  of  choices  of  the  alternatives  W,  L,  and 
S  of  the  13  stimuli  in  the  14  groups,  are  displayed  in  Table  4.  Honaners'  anal¬ 
ysis  of  these  data  consisted  of  chi  square  comparisons  between  these  figures, 
testing  various  hypotheses  about  differences  in  the  development  of  risk  vs. 
safety  orientation  and  EV  maximization  between  the  age  groups  tested  and  between 
the  normal  and  educable  retarted  children. 

However,  since  it  is  assumed  that  these  children  follow  seme  probabilistic 
choice  model,  it  is  feasible  to  apply  a  BTL  choice  model  to  these  data,  and 
do  a  likelihood  ratio  analysis.  Three  probabilistic  choice  models  derived 
from  Hommers'  hypotheses  seem  to  be  naturally  applicable  in  this  situation:  Ss 
are  either  (1)  safety  oriented,  i.e.,  focussing  on  the  probability  of  winning, 
and  thus  should  choose  the  alternatives  with  prohaoilities  proportional  to 
their  respective  winning  probabilities,  or  (2)  they  are  value  oriented,  and 
choose  with  probabilities  proportional  to  the  payoffs,  or  (3)  they  are  ex¬ 
pected-value  oriented,  and  choose  with  probabilities  proportional  to  the  ex¬ 
pected  values  of  the  alternatives.  All  wins  and  expected  values  are  positive. 
Choice  probabilities  for  the  alternatives  W,  L,  and  S  of  each  stimulus  are 
calculated  under  the  assumption  of  each  of  these  three  models,  and  displayed 
in  Table  5.  In  these  computations,  use  has  been  made  of  the  "auxiliary  sums" 
in  the  last  three  columns  of  Table  3?  e.g.,  in  stimulus  1,  the  sum  of  the  EV 
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Table  3:  Hommers'  (1973)  stimuli:  three-alternative  choices  among  bets 
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of  the  three-choice  alternatives  is  1] .0  (=  1.5  +  5.0  +  4.5),  and  thus,  under 
assumption  of  EV  orientation,  the  choice  probabilities  of  alternatives  W,  L, 
and  S  are  1.5/11.0  =  .156,  ^.O/W.O  =  .455,  and  4.5/11.0  =  .409,  respectively. 

For  convenience,  the  choice  probabilities  have  been  converted  into 
logarithms  in  the  right  half  of  Table  5.  As  in  the  previous  examples,  we 
again  assume  independence  of  observations,  so  that  the  likelihood  of  the  whole 
set  of  data  (observed  choice  frequencies)  or  of  parts  thereof  is  equal  to  the 
product  of  choice  probabilities  under  assumption  of  the  various  models.  In 
logarithms,  this  means  multiplying  the  choice  frequencies  from  Table  4  to  the 
logarithms  of  choice  probabilities  from  Table  5,  and  then  summing  up  over  al¬ 
ternatives  and  stimuli  for  each  model.  The  antilog  of  this  sum  is  the  likeli¬ 
hood  of  the  data  set  under  the  specified  hypothesis  or  model.  These  likeli¬ 
hoods  can  be  compared  pairwise  between  models  (but  only  for  the  same  data  set) ; 
however,  the  resulting  likelihood  ratios  can  be  compared  between  data  sets, 
i.e.,  between  the  different  experimental  groups. 

For  some  of  Hommers'  (1975)  data,  this  has  been  done  in  Tables  6-9.  The 
sume  in  the  bottom  rows  are  the  logarithms  of  the  likelihoods  (probabilities) 
of  the  respective  data,  assuming  that  the  probabilities  of  individual  choices 
are  generated  by  the  models  named  on  top  of  the  columns.  Of  course,  they  are 
all  negative;  the  larger  their  absolute  value,  the  smaller  the  probability  of 
the  data  under  the  respective  model. 

In  the  order  of  their  likelihoods,  we  get  from  the  four  groups  analyzed 
the  following  likelihood  ratios  between  pairs  of  models  (see  Table  10). 
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Similar  analysis  could  be  performed  for  other  10  of  Hommers'  14  groups 
too.  We  have  displayed  in  the  rightmost  column  of  Table  10  the  rank  order  of 
models  as  indicated  by  the  likelihood  ratios  calculated  from  the  data;  al¬ 
though  the  likelihood  ratios  themselves  differ  considerably,  it  is  interesting 
to  note  that  IT  year  old  retarded  children  show  the  same  rank  order  of  models 

as  the  8  year  old  normal  children,  thus  supporting  Hommers'  hypothesis  of 
retardation  as  a  shift  in  development.  Also,  comparison  of  the  results  from 
12  year  old  educable  retarded  children  without  gambling  experience  with  those 
from  their  classmates  with  prior  gambling  experience  unveils  a  considerable 
influence  of  this  experience  on  choices  among  gambles. 

Besides  these  analyses  for  individual  groups,  larger  groups  can  be  taken 
into  consideration,  e.g.,  likelihood  ratios  between  models  can  be  calculated 
over  all  Ss  with  prior  gambling  experience,  or  over  all  retarded  children  to 
be  compared  to  those  calculated  over  all  normal  children,  etc.  Since  we  used 
these  data  only  for  illustrative  purposes,  we  need  not  go  into  further  detail. 
Also,  we  will  +urn  to  the  problem  of  interpretation  of  such  analyses  later  in 


Seghers,  Fr.yback  &  Goodman's  Data 


The  next  set  of  data  we  are  going  to  use  are  those  of  Seghers,  Fryback  & 
Goodman  (1973)-  They  presented  their  Ss  sets  of  7  gambles,  like  those 
reproduced  in  Table  11: 

Table  11:  List  $1  as  an  example 


bet  //  win  on  4  lose  on  32  EV 


1 

1.55 

1.10 

-  .806 

.683 

2 

3-1+5 

1.15 

-  .639 

2.088 

3 

5.30 

1.20 

-  .478 

4.469 

4 

7.15 

1.25 

-  .317 

6.963 

5 

8.95 

1.30 

-  .162 

10.423 

6 

10.80 

1.35 

0 

14.567 

7 

12.65 

l.4o 

+  .162 

19.1+79 

Wins  and  losses  were  determined  by  means  of  a  roulette  wheel  which  was  respun 
if  0  or  00  occurred,  such  that  "win  on  4"  (numbers)  means  a  winning  probability 
of  4/36  =  1/9,  etc. 

Seghers,  Fryback  &  Goodman's  lists  varied  in 

(1)  expected  value  (EV), 

(2)  range  of  outcomes  (A-B), 

(3)  step  size  of  expectation  increase  ( AEV) , 

(4)  position  of  the  maximal  EV  bet  (OBP). 

Dependent  variables  were: 

(a)  choice  of  most  per f erred  gamble. 

(b)  rank  orderings  of  the  sets  of  7  gambles. 


Although  the  experimental  design  looks  as  though  a  factorial  design  AVOVA  had 
been  planned,  the  data  don't  permit  ni-.-h  an  analysis.  A  frequency  analysis 
an  suggested  by  Sutclitfe  (10H7)  would  be  more  appropriate,  however,  low  ex- 
pcvt.ed  fell  rreqiifnf  1  or.  In  l.h<>  overall  contingency  table  prohibits  such  an 
analysis. 


A  Bayesian  data  analysis  is  suggested  as  an  alternative. 

However,  since  Seghers,  Fryback  &  Goodman  assume  a  deterministic  decision 
making  model,  this  analysis  runs  into  the  problems  mentioned  before.  The 
simple  probabilistic  choice  model  used  to  analyze  Hommers '  data  is  no  longer 
appropriate  here  since  there  are  negative  expectations  which  are  not  compatible 
with  a  BTL  choice  model  based  on  these  expectations  as  scale  values. 

Deterministic  decision  making  models  predict  choice  of  the  optimal  gamble 
with  probability  1,  and  of  all  other  alternatives  with  probability  0 


P(choice  of  gamble  g.)  =  ^  J  is  optimal 

^  '0  else 


where  optimal  is  defined  in  the  context  of  the  respective  decision  making 
model  to  be  tested,  e.g.,  it  would  be  the  maximum  EV  bet  under  the  expectation 
maximization  model,  or  the  ideal  risk  bet  under  assumption  of  Coombs  Portifolio 
Theory.  Unfortunately,  likelihoods  of  0  or  1  cannot  be  handled  by  the  Bayesian 
data  analysis  model.  Thus,  we  have  to  modify  these  models  somehow  to  get  away 
from  the  0-1  likelihoods.  There  are  several  ways  to  do  so  of  which  we  will 
try  to 

(1)  keep  the  deterministic  model  in  principle,  but  dilute  the  too  peaked 
o-l  likelihood  function  by  allowing  for  some  error  variance, 
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(2)  modify  the  deterministic  hypothesis  somewhat  arbitrarily  to  smooth 
its  peak,  following  an  example  given  by  Pitz  (1968),  who  encountered 
a  similar  problem, 

(3)  abandon  the  deterministic  model  completely  in  favor  of  some  prob¬ 
abilistic  choice  model  (as  they  have  been  used  for  riskless  choices 
for  e  long  time), 

(^)  replace  the  deterministic  model  by  some  hybrid  of  deterministic  and 
probabilistic  components. 

We  will  explore  all  these  possibilities  in  turn. 

(l):  Introducing  error  variance:  Our  suggestion  is  to  dilute  the  too 

peaked  likelihood  functions  somewhat  by  allowing  for  error  variance:  The  di¬ 
luted  no  longer  assumes  Ss  always  pick  the  maximal  EV  gamble,  but  rather 
assumes  that  Ss  err  sometimes  in  the  sense  that  they  don't  choose  a  certain 
gamble  although  they  mean  to  choose  it. 

Fortunately,  the  data  by  Seghers,  Fryback  &  Goodman  provide  a  way  to  esti¬ 
mate  these  error  rates:  they  had  their  Ss  do  the  task  twice.  Our  suggestion 
.is  to  use  the  observed  discrepancies  between  first  and  second  choice  (under 
otherwise  equal  conditions)  as  estimates  of  error  rates.  To  do  so,  the  Ss 
first  and  second  choices  of  gambles  are  tallied  in  7x7  confusion  matrices,  sep¬ 
arately  for  each  given  position  of  optimal  EV  bet  (OBP).  A  completely  con¬ 
sistent  S  should  make  the  same  choice  on  both  occasions:  i.e„,  all  entries 
should  be  in  the  main  diagonal,  and  all  other  cells  should  be  empty.  Every 
deviation  from  this  diagonal  matrix  is  considered  an  "error,"  an  inconsistency, 
a  deviation  of  the  S  from  his  pure  strategy  assumed  under  the  hypothesis  of 
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expectation  maximization,  H  .  Assuming  that  Ss  err  at  both  choices,  i.e., 
both  1st  and  2nd  choices  have  a  chance  to  deviate  from  the  Ss'  true  choice  pre¬ 
dicted  by  his  strategy,  we  take  the  average  of  row  and  column  Ustri  button  for 
each  stimulus  as  its  error  distribution. 

This  procedure  assumes  that,  on  the  2  days,  S  at  least  once  chooses  his 
"ideal  bet"  without  making  an  error.  It  does  not  take  into  account  those  cases 
where  S  "wants  to"  select  a  certain  bet  but  "misses"  on  both  days.  This  may 
Jead  to  an  underestimation  of  error  rates.  A  better  way  would  be  to  get  con¬ 
tusion  probability  estimates  from  more  often  repeated  choices,  in  a  complete 
pair  comparison  matrix,  or  from  a  different  task,  like  the  procedure  used  in 
DeSoto  &  Bosley  (1962)  (quoted  in  Coombs,  Dawes  &  Tversky,  1970,  p.  68  ff.). 
This  cannot  be  done  with  these  data,  but  it  could  be  in  future  experiments — 

if  you  want  to  maKe  the  assumption  that  confusion  of  memory  traces  is 
representative  of  confusion  in  choices. 

Now,  with  this  knowledge  about  S' s  error  probabilities,  we  can  modify  the 
0-1  distribution  under  the  former  pure  expectation  maximization  hypothesis: 

We  diminish  the  peak  of  the  distribution  (formerly  P(D|H^)  =  1  at  maximal  EV 
bet)  by  replacing  the  1  by  the  repetition  rate  (1st  choice  =  2nd  choice)  in  1st 
choice/2nd  choice  confusion  matrix,  and  by  replacing  the  zeroes  by  the  relative 
frequencies  with  which  Ss  have  chosen  the  respective  gambles  "erroneously." 

Thus,  the  EV  maximization  hypothesis  H  implies  data  probabilities  of 

1 

P(D  |H  )  =  the  repetition  probability  of  the  maximal  EV  bet  for 

o  1 

the  maximal  EV  bet  (D  )  chosen 


and 


P(^d.|Hi)  =  Probability  cboosinS  ^  given  S  has 

i^°  the  same  trial  in  the  1st  or  2nd  repetition. 


chosen  D  on 
o 


(Z  P(D.  |  H  )  should  be  1  if  everything  is  correct.)  Analogous  computations  can 
i  1 

be  done  for  other  alternative  hypotheses,  like  variance  perference,  also. 

Tables  12  and  13  give  examples  of  such  confusion  matrices  between  1st  and 
2nd  choice:  Table  12  are  absolute  frequencies;  Table  5  is  the  same  matrix  with 
a  matrix  of  ones  added  to  it.  (Actually,  the  entries  in  Table  12  are  averaged 
over  2  presentations.) 

The  rationale  for  adding  these  ones  to  the  cells  is  again  a  Bayesian  one: 
we  are  revising  here,  in  principle,  Dirichlet  distributions  (see,  e.g.,  Novick  & 
Grizzle,  1965),  We  start  with  a  uniform  (flat)  prior  distribution 
D(l,  1,  1,  1,  1,  1,  l)  with  all  parameters  equal  to  1,  and  then  add  to  them  the 
numbers  of  observations  to  obtain  the  parameters  of  the  posterior  distribution 
after  Bayesian  revision.  However,  summing  cell  entries  from  row  and  column 
would  assume  independence  of  observations  from  the  two  sessions  which  probably 
is  not  given  since  we  assume  that  S's  choices  were  influenced  by  the  same  pref¬ 
erence  structure  on  both  days.  Thus,  to  avoid  an  overly  peaked  Dirichlet  dis¬ 
tribution,  we  average  over  column  and  row  entry  rather  than  adding  them  up. 
Actually,  this  does  not  make  a  difference  as  long  as  we  calculate  only  means 


and  not  variances. 
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As  an  illustration,  assuming  that  gamble  §1  is  the  optimal  bet  in  the  Ss' 
view  (H  ),  and  having  observed  the  number  of  choices  displayed  in  Table  13,  we 
get: 

Table  14 


from  row  1  : 

117.5 

15-5 

8.5 

3.5 

2.5 

4 

6 

from  column  1  : 

117-5 

14.5 

10.5 

6 

4.5 

2 

9 

sum  of  both  : 

235 

28 

19 

9.5 

7 

6 

15 

average  : 

and  thus 

117.5 

14 

9-5 

4.25 

5.5 

3 

7.5 

the  choice 

probabilities:  .734  .088  .060  .027  .022  .019  .047 

for  gamble  #  :  1  2  3  4  5  6  7 

when  gamble  #1  is  the  "true  choice"  assumed  by  the  model. 

Some  results  of  such  tallies  are  reproduced  in  Table  15,  assuming  various 

choice  strategies  on  the  side  of  the  Ss.  Column  2  displays  choice  probabilities 
under  an  a  priori  random-choice  null  hypothesis  (all  gambles  chosen  with  equal 
probability  1/7  =  .143;. 

Table  15 


H^:  maximize  EV: 
maximal  EV  is  in  gamble 


.112 

.818  .054 

.080' 

•O63  .117 

.031"! 

.025  L  .652' 


Columns  3  through  t  arc  the  diluted  choice  probabilities  assuming  ex¬ 
pectation  maximization  with  some  errors,  calculated  in  the  manner  described 
above  from  confusion  matrices  between  choices  in  first  and  second  sessions 
of  Ss  but  tallied  separately  for  lists  where  gambles  1,  3,  5,  and  7  were  op¬ 
timal,  respectively. 

Column  7  is  calculated  from  the  tallies  illustrated  in  Tables  12,  13,  and 
14,  assuming  that  Ss  have  the  strategy  of  always  picking  gamble  #1,  no  matter 
what  the  parameters  of  the  gambles  in  the  list  are. 

Columns  fi  through  10  are  choice  probabilities  calculated  under  similar 
hypotheses,  assuming  that  Ss  have  preferences  for  certain  regions  of  the  lists 
of  gambles  presented  to  them,  i.e.,  that  they  always  pick  gambles  #1-3,  or 

or  #3-5,  respectively. 

With  the  choice  probabilities  from  Table  15  taken  as  p(D|H  ),  all  these 
models  can  be  tested  against  each  other  by  calculating  the  respective  likelihood 

ratios.  To  make  the  analysis  more  convenient,  all  hypotheses  could  be  tested 
iirst  against  the  random-choice  null  hypothesis  (H  ).  The  resulting  likeli¬ 
hood  ratios  against  Hq  could  then  be  divided  by  each  other  to  yield  likeli¬ 
hood  ratios  agains  each  other  since 

P(D|H.)  /  P(D|  H.)  P(D(H.) 

p(DlH0)/  r*(D| h  )  “  p(d|  h  ) 

J 

However,  this  is  only  feasible  as  far  as  H  and  H,  are  mutually  exclusive. 

1  J 

H^,  H2  and  H  in  Table  15  are  not  since  they  all  assume  a  strategy  to  choose 
gamble  #1. 


The  choice  probabilities  assumed  under  hypotheses  through  H,_  from  Table 


15  yield  the  likelihood  ratios  reproduced  in  Table  l6  if  tested  against  the 


uniform  distribution  H  . 

o 


To  use  Table  16,  we  multiply  the  entries  by  the  prior  odds  every  time  the 


respective  datum  comes  up;  e.g.,  to  test  hypothesis  against  H^,  we  would 


multiply  prior  odds  (i.e.,  odds  so  far  obtained)  by  5.1^  if  §  chooses  gamble 
#1,  and  gamble  #1  is  optimal  (maximal  EV)  in  the  respective  list. 


Table  16:  Likelihood  ratios  calculated  from  Table  15 


Again,  it  will  be  more  convenient  to  do  this  in  terms  of  logarithms,  thus 
we  have,  in  Table  17,  the  log  LR^  in  column  3,  and  the  number  of  choices  for 
the  respective  gamble  in  column  2. 


(1) 

gamble 

4 


log  LR 
LR 


(2) 

number  of 
choices 


Table  iy 


LR4/0 


-  .1938 

+  .7110 

-  *1079 

-  .5528 

-  .2076 

-  .4202 

-  .4949 

-  .3768 

-  .2518 

-  .4559 

-  .6778 

-  .0862 

-  .5528 

-  .8239 

+  .0755 

-  .3665 

-  .8861 

+  .0755 

+  .6712 

-  .4815 

+  .0755 

+6.6657 

-  8.9087 

+  .2838 

4.631*106 

1/(8. 104* 108) 

1.922 

The  data  in  column  ?  are  the  choices  made  by  12  Ss  in  2  sessions  among 
the  gambles  of  list  #1,  reproduced  in  Table  11,  where  gamble  #7  had  maximal 
EV,  such  that  the  logarithms  in  column  3  of  Table  17  are  those  of  the  like¬ 
lihood  ratios  in  column  5  or  Table  l6.  The  sum  of  the  products  of  entries  in 

columns  2  and  3  of  Table  17,  the  overall  log  likelihood  ratio,  is  6.6657,  in- 

.  .  6 

dicating  a  likelihood  ratio  of  4.631*10  in  favor  of  expectation  maximization 

(II  )  over  random  choice  (II  ). 

1  o 

Columns  4  and  5  show  the  respective  log  LR  for  hypothesis  (always  pick 

gamble  #l)  over  the  random  choice  hypothesis  H  ,  and  for  hypothesis  H.  (always 

o  4 

pick  gamble  #5,  6,  or  7)  against  the  random  choice  hypothesis  H  .  Resulting 

o 

3 

likelihood  ratios  LR  >  =  8. 104*10  in  favor  of  H  (random  choice)  over  H 
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^ I 


qggqggg 


(always  pick  gamble  #l)  with  these  data,  and  LRjy^  =  1.922  in  favor  of  H^ 
(always  pick  jf  5,  6,  or  7)  over  (random  choice). 

So  far,  we  have  analyzed  only  the  choices  among  gambles  of  one  list —  of 
course,  it  is  feasible  and  advisible  to  do  it  over  the  whole  set  of  data  from 
all  lists,  simply  by  summing  up  the  respective  log  LR^y^  over  all  data  for  the 


various  hypotheses  H  .  Seghers,  Fryback  &  Goodman  have  done  this  for  each  of 


their  Ss,  individually,  and  we  are  reproducing  their  results  for  one  of  their 
Ss  as  an  example  in  Table  18.  Besides  calculating  likelihood  ratios  LR^y^  for 


the  aforementioned  hypotheses  against  the  random  choice  hypothesis  over 


all  (lists)  (column  2),  they  also  did  it  for  specified  subsets  of  lists,  e.g., 
lists  with  high  EV  (column  2),  lists  with  low  EV  (column  4),  lists  with  high 
EV  differences  between  gambles  in  the  lists  (column  5)>  lists  with  low  EV  dif¬ 
ferences  (column  6),  lists  of  gambles  with  large  variances  (range  of  bet,  i.e., 
| win-loss | )  (column  7),  and  lists  of  gambles  with  small  variances  (column  8). 
Thus,  it  is  possible  to  compare  data  likelihood,  for  the  various  hypotheses  H 
under  different  stimulus  conditions. 


This  breaking  down  likelihood  ratio  analyses  into  analyses  over  mutually 
exclusive  subsets  of  the  whole  data  set  corresponds  roughly  to  what  is  done  to 
the  sum  of  squares  in  analysis  of  variance  (ANOVA),  or  to  the  chi  square  in 
analyses  of  multi-dimensional  contingency  tables  (e.g,,  see  Sutcliffe,  1957): 
It  shows  how  much  the  respective  subsets  of  data  (i.e.,  data  under  specific 
conditions)  contribute  to  the  overall  likelihood  ratio.  To  make  fair  com¬ 
parisons  of  this  kind,  we  have  to  take  care  that  these  subsets  are  of  equal 


size. 
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l  ii'Ti 


The  product  of  the  likelihood  ratios  LR^  competing  hypotheses  1^, 
from  exhaustive  and  mutually  exclusive  subsets  of  data  equals  their  likelihood 
ratio  over  the  whole  data  set.  E.g.,  in  each  row  of  Table  l8,  the  products  of 
entries  in  columns  3  and  4,  5  and  6,  or  7  and  8  equal  each  other,  and  equal  the 
entry  of  column  2,  except  for  rounding  errors.  (This  provides,  by  the  way,  an 
easy  means  of  checking  computations.) 

The  results  of  such  likelihood  ratio  analyses  over  the  subsets  of  data 
can  be  used  to  find  out  under  which  conditions  which  hypotheses  are  how  much 
more  likely  than  others,  and  thus  may  lead  to  more  specific  theories  about  the 
underlying  pattern  of  behavior. 

The  comparison  of  likelihood  ratio  analysis  to  more  conventional  methods 
like  ANOVA  is  not  always  straightforward;  the  easiest  comparable  traditional 
technique  would  be  a  frequency  analysis  because  it  deals  with  the  frequencies 
of  occurrence  of  events  which  enter  directly  the  likelihood  ratio  analysis  (as 
exponents . ) 

Seghers,  Fryback  &  Goodman  did  analyses  of  variance  over  the  same  data  we 
used  for  demonstration  in  Table  l8,  both  terms  of  absolute  deviation  of  bet 
number  as  dependent  variable,  and  in  terms  of  absolute  deviation  of  bet  number 
as  dependent  variable,  and  in  terms  of  absolute  deviation  of  bet  number  chosen 
from  maximal  EV  bet  number  in  the  respective  list.  Results  (for  the  same  S, 
and  same  session  as  in  Table  18)  are  shown  in  Table  19. 

Seghers,  Fryback  &  Goodman’s  lists  were  constructed  in  such  a  way  that, 
given  the  maximal  EV  bet  in  the  list  (in  positions  #1,  #3,  #5,  or  #7  of  the 
list  =  optimal  bet  position  OBP),  the  adjacent  gambles  decreased  in  EV  to  both 
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sides  by  a  step  size  DEV  =  difference  in  expected  value.  Thus,  the  dependent 
variable  "absolute  deviation  of  number  bet  chosen  from  number  of  maximal  EV 


bet"  can  be  considered  a  measure  of  S's  deviation  from  expectation  maximation 
behavior. 

Whereas  such  independent  variables  like  "high  level  of  maximal  EV  in  list" 
versus  "low  level  of  maximal  EV  in  list"  (first  line  in  Table  19),  large  step 
size  of  EV  differences  in  list  versus  small  step  size  (line  2  in  Table  19), 
and  range  of  outcomes  of  gambles  (line  3  in  Table  19)  show  no  significant  dif¬ 
ference  in  the  dependent  variables,  there  are  some  differences  between  the  con¬ 
tributions  of  the  respective  subsets  of  data  to  the  likelihood  ratio  between 
expectation  maximization  and  random  choice  hypotheses  in  Table  18  (line  1).  How¬ 
ever,  we  have  no  means  to  compare  these  two  kinds  of  analyses  quantitatively. 

Testing  the  various  hypotheses  about  choice  behavior  against  the  random 
choice  hypothesis  H  is  the  approach  to  their  evaluation  that  comes  closest  to 
traditional  hypothesis  testing.  Testing  them  against  the  most  descriptive 
choice  probabilities  is  another  possibility  these  likelihood  analyses  offer 
for  which  no  counterpart  exists  in  traditional  statistics. 

Comparisons  of  data  likelihoods  under  the  various  hypotheses  aforementioned 
to  these  (by  definition)  maximal  likelihoods  can  show  how  far  out  hypotheses 
H  deviate  from  actual  behavior.  These  most  descriptive  choice  probabilities 
specify  upper  bounds  for  data  likelihoods,  under  the  choice  hypotheses,  as 
illustrated  in  Figure  1. 


L(D) 


normative  models 


random  choice,  uniform  p 


Figure  1 


H  optimal 
description 

maximum  likelihood  p 


The  most  descriptive  (maximum  likelihood)  vector  of  choice  probabilities 
for  the  seven  gambles  can  be  obtained  for  each  subject  from  his  choices  by 
the  following  method:  the  data— choices  of  one  out  of  seven  gambles  in  each 
list— are  generated  by  a  multinomial  distribution,  with  choice  probabilities 
p.  following  a  Dirichlet  distribution.  Thus  we  can  assume  a  flat  Dirichlet 

t! 

distribution  D(l,  1,  1,  1,  1,  1,  l)  as  prior,  a  multinomial  data  generating 
process  yielding  x.  choices  of  gamble  g  and  thus  leading  (via  a  Bayesian 
probability  distribution  revision)  to  a  Dirichlet  posterior  distribution, 
l)(x1  +  1,  xg  +  1,  x?  +  1,  x^  +  1,  x^  +  1,  x6  +  1,  x?  +1).  This  Dirichlet 
posterior  distribution  gives  us  the  probability  P(p|x)  of  vector  of  choice 
probabilities  (p^  p2,  p^,  Pl;,  py  pg,  p?)  =  p  of  gambles  through  g^,  given 
the  vector  of  observed  choice  frequencies  (x^,  x^,  Xy  x^,  Xy  x^,  x^)  =  x, 
and  what  we  need  is  that  vector  pQ  for  which  P(p|x)  is  maximal  over  the  space 
of  all  possible  p.  (Note  that  this  space  is  restricted  by  £p  =  1  for  each  p.) 

j  J 

We  take  S  $1  of  Seghers,  Fryback  &  Goodman,  again,  as  an  example.  His 
(or,  rather,  her)  choices  are  reproduced  in  columns  2,  5,  8,  and  11  for  the 
respective  OBP  condition?,  and  summed  up  in  column  14  of  Table  20.  Columns 

i,  o,  b,  and  12  contain  the  choice  probabilities  under  the  diluted  expectation 
maximization  hypothesis  from  Table  15,  in  columns  I4 ,  7,  10,  and  13  we  find 
ti.o  corresponding  logarithms.  The  log  .likelihood  for  expectation  maximization 
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calculated  from  these  figures  is  -49.7932.  The  S's  most  descriptive  strat¬ 


egy,  computed  as  outlined  in  the  preceeding  paragrapn,  is  given  in  column  16, 


with  the  corresponding  logarithms  in  column  17*  The  log  likelihood  from 
tnese  figures  (which  is  the  maximal  attainable)  is  -44.3123,  and  the  log 


Likelihood  of  this  S's  choices  under  the  random  choice  hypothesis  H  is 

—  o 

b4  *  log  1/7  =  -34 . Ot >0(1.  The  expectation  maximization  hypothesis  (H^)  comes 


much  closer  to  the  subjects  most  descriptive  strategy  (H  )  than  to  the  ran¬ 


dom  choice  strategy  (H  ).  The  respective  likelihood  ra.tios  are 

o 


iiH, .  /.  =  3.020  x-  10' 

(/ 1 


LRi/0  =  1.852  *  10 


LR,^o  =  ‘>.004  *  10' 


We  have  so  far  used  the  assumption  that  Ss  occasionally  deviate  from 


their  ideal  choice  and  make  "errors"  in  their  decisions  which  we  could  use 


to  get  rid  of  the  choice  probabilities  of  0  and  1  assumed  by  the  determi¬ 


nistic  normative  models  of  decision  making. 
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Expectation  Preference  .Model 


In  discussing  Honuners'  paper,  we  have  seen  that  the  assumption  of  prob¬ 
abilistic  preference  models  rather  than  deterministic  choice  models  is  another 
feasible  way  to  avoid  choice  probabilities  of  0  and  1. 

For  gambles  of  the  form  g  =  (w  ,  p,,  1  )  where  Ss  wins  the  payoff  w 

j  J  J  tJ  J 

with  probability  p  and  loses  1.  with  probability  (l-p.),  this  model  assumes 
J  J  J 

that  Ss  choose  a  gamble  g.  with  probability  P(g.)  proportional  to  the  relative 

<]  J 

utility  U(g  .)  of  the  gamble  g., 

<]  J 


p(g.)  =  U(g  )/l  u(g  ), 

J  J  •  J 

J 


where 


U(gj)  =  EV(gj)  -  p.w.  +  (1-PjUj 


under  the  expectation  preference  model.  For  each  choice  of  g  an  S  makes, 

j 

P(g  )  is  the  likelihood  of  this  observation  to  occur  under  assumption  of 
J 

this  model. 

This  expectation  preference  model  works  fairly  well  for  sets  of  gambles 
where  all  EVs  are  positive,  as  we  have  seen  in  the  analysis  of  Hommers’  data. 
However,  it  will  run  into  difficulties  if  the  EV  of  one  or  more  gambles  in 
the  list  (set  of  choice  alternatives)  is  negative  or  zero. 

A  Thurstonean  (rather  than  Lucean)  choice  model  might  help  in  this  case. 
Here,  choice  probabilities  are  only  dependent  on  differences  between  utilities 


k2 


of  choice1  alternatives,  and  not  on  their  absolute  values.  Under  the 
assumptions  of  this  model,  the  probability  of  choosing  one  element  (i.e.,  a 
gamble)  in  a  pair  of  alternatives  is  equal  to  the  integral  cf  the  normal  dis¬ 
tribution  from  -  oo  to  the  difference  in  utilities  (expected  values)  of  the 
respective  pair,  where  the  mean  of  this  normal  distribution  is  0,  and  its 
variance  is  the  variance  of  the  utility  difference  which  is  the  sum  of  the 
variances  of  the  discriminal  dispersions  of  the  two  elements  (gambles)  in  the 
pair,  if  we  assume  independence  (uneorrelatedness)  of  these  two  discriminal 
processes.  Application  of  this  model  requires  estimation  of  these  variances 
which  can  be  obtained  from  repeated  choices. 

Regret  Avoidance  Models 

A  way  to  apply  a  Lucean  choice  model  to  choices  among  bets  including 
gambles  with  EV  <  0  might  be  to  consider  regrets  rather  than  payoffs.  Regrets 
are  obtained  from  payoffs  by  reducing  them  by  the  maximal  amount  obtainable 
with  each  given  state  of  world.  Regrets  calculated  by  this  method  are  all 
negative;  they  arc  measures  of  undesirability  rather  than  desirability.  Thus, 
it  does  not  make  sense  to  assume  choice  probabilities  proportional  to  regrets. 
What  we  need  is  some  antitone  transformation  on  the  regrets  which  leads  to  high 
choice  probabilities  for  low  regrets,  and  low  choice  probabilities  for  large 
regrets.  We  propose  three  simple  models  for  this  purpose: 

(a)  the  sum-difference  regret  model  assumes  that  choice  probabilities  are 
proportional  tc  the  deviation  of  the  respective  expected  regrets  from  the  sum 
of  all  regrets, 


b3 


. 


let-:  •U'Xv.-A'Sl  1 


i  ri  -  ri 
(N  -  1)  Y 


where  r  is  the  expected  regret  associated  with  the  i  alternative,  smallest 
regret  being  0,  N-number  of  alternatives.  Model  (a)  gives  choice  probabilities 
with  a  rather  small  variance,  i.e.,  the  choice  probabilities  are  not  very 
sensitive  to  differences  in  regrets. 

(b)  the  reciprocal  regret  model  assumes  that  choice  probabilities  are 
proportional  to  the  reciprocals  of  the  respective  expected  regrets, 


r  £  _1 
1  i  ri 


This  leaves  P(i)  for  r^  =  0  undefined.  Model  (b)  leads  to  stronger  deviations 
of  choice  probabilities  from  a  uniform  distribution  over  alternatives  to  dif¬ 
ferences  in  regrets,  i.e.,  model  (b)  is  more  sensitive,  but  cannot  always  be 
used  because  if  leaves  the  choice  probability  for  an  expected  regret  =  0  un¬ 
defined. 

(c)  the  max-difference  model  assumes  that  choice  probabilities  for 
alternatives  i  are  proportional  to  the  differences  between  the  respective 
expected  regrets  and  the  maximal  expected  regret, 

max  [r  ]  -  r 

p(i)  *  - 5 — 

N  max  [r  ]  -  £  r 
i  i  ■?  — t  i 


This  model  is  more  sensitive  to  differences  in  expected  regrets  than  model 
(a)  and  leaves  no  choice  probabilities  undefined  as  does  model  (b),  but 
leads  to  a  0  choice  probability  for  the  maximal  expected  regret  alternative. 
This  is  an  undesirable  consequent'  for  a  BTL  choice  model  but  may  be  quite 
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realistic.  In  the  data  analysis,  it  will  hurt  only  if  any  S  picks  the  maximum 
expected  regret  gamble. 

For  the  example  of  list  =ffl  from  Seghers,  Fryback  &  (Goodman  (see  Table  11), 
Table  ?1  shows  the  respective  choice  probabilities  with  these  probabilistic 
regre'1  avoidance  models  in  columns  8,  11,  and  ll,  with  the  corresponding  log¬ 
arithms  in  columns  9,  12,  and  lb.  Column  17  displays  the  choice  probabilities 
under  error-diluted  deterministic  expectation  maximization  hypothesis  11^  as 
given  in  fable  ly,  and  column  18  ol  Table  21  contains  their  logarithms.  In 
column  19,  we  have  the  actual  numbers  of  choices  made  by  S  in  this  list  of 

gambles,  for  which  we  calculated  the  likelihoods  under  the  hypothesis  H 

o 

(random  choice),  (diluted  expectation  maximation),  FL  (reciprocal  regret), 

(sum-difference  regret),  and  H  (max-difference  regret).  Table  22  displays 

the  pairwise  likelihood  ratios  between  these  hypotheses. 

As  we  can  see,  the  data  are  1067  times  more  likely  under  the  diluted 

deterministic  expectation  maximization  hypothesis  H  than  under  the  most  favored 

probabilistic  regret-avoidance  hypothesis  H.  The  data  likelihood  under  the 

o 

least  favored  probabilistic  regret-avoidance  hypothesis  Hf  is  almost  as  large 
as  under  random  choice  assumption  Hq,  LR ~  1,111. 

This  indicates  that  for  likelihood  ratio  analyses  of  choices  among  bets 
made  by  adult  subjects,  error-diluted  deterministic  expectation  maximization 
models  seem  much  more  likely  than  probabilistic  preference  models.  However, 
in  the  case  of  Hommers’  data  where  no  source  to  estimate  the  error  rate  was 
available,  probabilistic  preference  models  proved  quite  useful.  It  should  be 
mentioned  that  neither  of  these  studies  was  originally  designed  for  a  like¬ 
lihood  ratio  analysis  if  this  had  been  the  case,  adequate  measures  would 

lO 
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have  been  provided  beforehand. 

Pitz,  1968  found  another  way  of  handling  the  problem  of  data  probabilities 

of  0  and  1,  in  another  context,  but  also  with  data  originally  not  observed 

with  a  likelihood  ratio  analysis  in  mind.  He  tested  a  (null-)  hypothesis  H  of 

o 

equal  probability  of  two  kinds  of  observations  (p  =  0.5)  against  the  rather 
unspecific  hypothesis  of  p  >  0.5.  The  data  showed  that  32  out  of  h3  Ss 

gave  responses  in  accordance  with  H^.  The  likelihood  ratio  for  these  data  would 
have  been 


L  = 


.5 


h8 


„  32  „  .16 

P1  (1-v 


From  this  equation  Pitz  determined  the  value  of  p^  for  which  the  data  would 
be  equivocal,  i.e.,  for  which  L  would  be  one:  .5^  =  p^2  (l-p^16  => 

P1  **  .8.  (That  means:  if  meant  p  >  .8,  the  data  would  actually  favor 
H0  rather  than  H^.)  Pitz's  suggestion  is  to  consider  as  a  distribution 
g(p)  over  p  rather  than  a  constant  p  ,  such  that  the  likelihood  ratio  is 


L  = 


.5 


.1*8 


fl.O  32  /.  >16  ,  . 

I  P  (1-p)  g(p)  dp 


.5 


and  he  proposes  several  possible  distributions  g(p),  such  as  a  uniform 
(rectangular)  distribution  over  [.5,  1.0],  a  triangular  distribution  with 
g(p)  =  0  for  p  <  .5,  and  a  kind  of  beta  distribution  with  a  rather  high 

mean.  Such  an  analysis  could  be  done  with  the  Seghers,  Fryback  &  Goodman 
data y  too • 
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Conclusion 


Wow  that  we  have  seen  that  we  can  figure  likelihood  ratios  between  various 
competing  hypotheses  on  given  data  sets  which  were  not  even  made  for  it,  what 
do  we  do  now? 

For  a  complete  Bayesian  data  analysis,  we  would  multiply  our  computed 
likelihood  ratios  to  some  prior  odds  for  the  respective  hypotheses 0  These 
prior  odds  may  be  more  or  less  public,  rr  may  be  our  very  personal  belief 
states.  Methods  to  elicit  and  assess  such  prior  distributions  have  been 
introducted  and  discussed  elsewhere  (e.g.,  Winkler,  1967,  Stael  von  Holstein 
1970) . 

For  a  complete  Bayesian  analysis,  we  would  consider  the  possible  con¬ 
sequences  of  our  decisions  between  competing  hypotheses,  in  terms  of  utilities 
assessed  to  the  various  combinations  of  our  decisions  among  hypotheses  with  the 
possible  true  states  of  the  world,  and  use  these  utilities  in  connection 
with  our  prior  odds  to  determine  cutoffs  for  the  likelihood  ratios  where  to 
decide  in  favor  of  which  hypothesis  or  model.  There  are  various  techniques 
available  now  for  the  assessment  of  utilities  to  outcomes,  even  if  these  out¬ 
comes  are  characterized  by  several  reveiant  attributes.  These  techniques  have 
been  summarized  recently  by  Fischer  (1972). 

As  we  have  seen  in  the  few  examples  given  in  this  paper,  likelihood  ratios 
grow  rather  rapidly  with  larger  amounts  of  data.  Even  very  biassed  prior 
uads  would  be  brought  very  soon  into  the  correct  range  by  multiplication  to 
"hese  large  likelihood  ratios.  This  indicates  that  Bayesian  analyses  might  get 
along  with  much  smaller  sample  sizes  than  traditional  statistical  data  analyses 


with  their  diffuse  alternative  hypotheses.  How  much  precisely  can  be  econ¬ 
omized  on  the  sample  size,  will  depend  in  each  case  on  the  cutoff  determined 
by  prior  odds  and  costs  and  payoffs  (utilities)  involved,  as  indicated  by  a 
proper  decision  analysis  (see,  e.g.,  Raiffa,  1969).  That  a  careful  formulation 
of  competing  hypotheses  alone  can  result  in  considerable  savings  on  expected 
sample  size,  has  been  shown  by  Wald  (19^7)  already. 
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